AI art using GANs (generative adversarial networks) is new enough that the art world does not understand it well enough to evaluate it. We saw this unfold last month when the French artists’ collective Obvious stumbled into selling their very first AI artwork for $450K at Christie’s.
Many in the AI art community took issue with Christie’s selecting Obvious because they felt there are so many other artists who have been working far longer in the medium and who are more technically and artistically accomplished, artists who have given back to the community and helped to expand the genre. Artists like Helena Sarin.
Sarin was born in Moscow and went to college for computer science at Moscow Civil Engineering University. She lived in Israel for several years and then settled in the US. While she has always worked in tech, she has moonlighted in the applied arts like fashion and food styling. She has played with marrying her interests in programming and art in the past, even taking a Processing class with Casey Reas, Processing felt a little too much like her day job as a developer. Then two years ago, she landed a gig with a transportation company doing deep learning for object recognition. She used CycleGAN to generate synthetic data sets for her client. Then a light went off and she decided to train CycleGAN with her own photography and artwork.
This is actually a pretty important distinction in AI art made with GANs. With AI art, we often see artists using similar code (CycleGAN, SNGAN, Pix2Pix etc.) and training with similar data sets scraped from the web. This leads to homogeneity and threatens to make AI art a short-lived genre that quickly becomes repetitive and kitsch. But it doesn’t have to be this way. According to Sarin, there are essentially two ways to protect against this if you are an AI artist exploring GANs.
First, you can race to use the latest technology before others have access to it. This is happening right now with BigGANs. BigGANs produce higher-resolution work, but are too expensive for artists to train using their own images. As a result, much of the BigGAN imagery looks the same regardless of who is creating it. Artists following the path of chasing the latest technology must race to make their stamp before the BigGAN aesthetic is “used up” and a “BiggerGAN” comes along.
Chasing new technology as the way to differentiate your art rewards speed, money, and computing power over creativity. While I find new technology exciting for art, I feel that the use of tech in and of itself never makes an artwork “good” or “bad.” Both Sarin and I share the opinion that the tech cannot be the only interesting aspect of an artwork for it be successful and have staying power.
The second way artists can protect against homogeneity in AI art is to ignore the computational arms race and focus more on training models using your own hand-crafted data sets. By training GANs on your own artwork, you can be assured that nobody else will come up with the exact same outputs. This later approach is the one taken by Sarin.
Sarin approaches GANs more as an experienced artist would approach any new medium: through lots and lots of experimentation and careful observation. Much of Sarin’s work is modeled on food, flowers, vases, bottles, and other “bricolage,” as she calls it. Working from still lifes is a time-honored approach for artists exploring the potential of new tools and ideas.
Sarin’s still lifes remind me of the early Cubist collage works by Pablo Picasso and Georges Braque. The connection makes sense to me given that GANs function a bit like an early Cubist, fracturing images and recombining elements through “algorithms” to form a completely new perspective. As with Analytic Cubism, Sarin’s work features a limited color pallet and a flat and shallow picture plane. We can even see the use of lettering in Sarin’s work that looks and feels like the lettering from the newsprint used in the early Cubist collages.
I was not surprised to learn that Sarin is a student of art history. In addition to Cubism, I see Sarin’s work as pulling from the aesthetic of the German Expressionists. Similar to the woodblock prints of artists like Emil Nolde and Erich Heckel, Sarin’s work has bold, flat patterns and graphic use of black. She also incorporates the textures resulting from the process as a feature rather than hiding them, another signature trait of the Expressionist woodblock printmakers.
I think printmaking is a much better analogy to GANs than the oft-used photography analogy. As with printmaking, technology for GANs improves over time. Moving from woodblock to etching to lithography, each step in printmaking represents a step towards more detailed and realistic-looking imagery. Similarly, GANs are evolving towards more detailed and photorealistic outputs, only with GANs, this transition is happening so fast that it can feel like tools become irrelevant every few months. This is particularly true of the arrival of BigGANs, which require too much computing power for independent artists to train it with their own data. Instead, they work from a pre-trained model. This computational arms race has many in the AI art community wondering what Google research scientist David Ha recently put into words on Twitter:
Sarin collected her thoughts on this in the paper #neuralBricolage, which she has been kind enough to let us share in full below.
Will AI art be a never-ending computational arms race that favors those with the most resources and computing power? Or is there room for modern-day Emil Noldses and Erik Heckels who found innovation and creativity in the humble woodblock, long after “superior” printmaking technologies had come along?
Helena Sarin is an important artist who is just starting to get the recognition she deserves. Her thoughts here form the basis for some of the key arguments about generative art (especially GAN art) moving forward.
#neuralBricolage: An Independent Artist’s Guide to AI Artwork That Doesn’t Require a Fortune
tl;dr With recent advent of BigGAN and similar generative models trained on millions of images and on hundreds of TPUs (tensor processing units), the independent artists who have been using neural networks as part of the artistic process might feel disheartened by the limitation of compute and data resources they have at their disposal. In this paper I argue that this constraint, inherent in staying independent, might in fact boost artistic creativity and inspire the artist to produce novel and engaging work. The created work is unified by the theme of #neuralBricolage - shaping the interesting and human out of the dump heap of latent space.
Hardly a day passes without the technical community learning about new advances in the domain of generative image modeling. Artists like myself who have been using GANs (generative adversarial networks) for art creation often feel that their work might become irrelevant, since autonomous machine art is looming and generative models trained on all art history will soon be able to produce imagery in every style and with high resolution. So those of us who got fascinated by creative potential of GANs but frustrated by the output of low resolution, what options do we have?
Not that many, it seems; you could join the race, building up your local or cloud compute setup, or start chasing the discounts and promotions of ubiquitous cloud providers utilizing their pre-trained models and data sets - the former prohibitively expensive, the latter good for learning but too limiting for producing unique artwork. The third option would be to use these constraints to your benefit.
Here I share the aesthetics I’m after and the techniques I’ve been developing for generating images directly from GANs, within the constraints of only having small compute and not scraping huge data sets.
Look at it as an inspirational guide rather than a step-by-step manual.
Setup
In any ML art practice, the artist needs the GPU server, ML software framework, and data sets. I consider my hardware/software setup to be quite typical - I’m training all my GANs on a local server equipped with a single GTX 1080TI GPU. Compute resource constraints mean that you can only use specific models - in my case it’s CycleGAN and SNGAN_projection, since both can be tuned to do a training from scratch on a single GPU. With SNGAN I can generate images with resolution up to 256x256, further upscaling them with CycleGAN.
Data sets
From the very beginning of my work with GANs I’ve been committed to using my own data sets, composed of my own drawings, paintings, and photography. As Anna Ridler, the ML artist who also works exclusively with her own imagery, rightly suggested in her recent talk at ECCV: “Everyone is working with the same data sets and this narrows the aesthetics.” I covered my approach for data sets collection and organization in my recent blog “Playing a Game of GANstruction”
Process
The implications of BigGAN-type models are widely discussed in the machine art community. Gene Kogan recently suggested that “like painting after the advent of the camera, neural art may move towards abstraction as generative models become photorealistic.” And at least in the short term, the move towards abstraction is in a sense inevitable for those of us working under resource constraints, as training on modestly sized data sets and a single GPU would make the model collapse long before your model is able to generate realistic images. You would also need to deal with the low resolution of the GAN when training/generating images with constrained resources. Not to despair - GAN chaining and collaging to the rescue! Collage is a time-honored artistic technique - from Picasso to Rauschenberg to Frank Stella, there are many examples to draw from for GAN art.
My workflow for GAN output generation and post-processing usually follow these steps where each one might yield interesting imagery:
Step 1: Prepare data sets and train SNGAN_projection. The reason I’m using SNGAN is that projection discriminator allows you to train on and generate several classes of images, for example, flower painting and still life. An interesting consequence of working with images that don’t have obvious landmarks or homogeneous textures as in ImageNet is that it causes glitches in the models expecting ImageNet-type pictures. These glitches cause class cross-contamination and might bring interesting pleasing effects (or might not - debugging the data sets is quickly becoming a required skill for an ML artist). As a result, the data set’s composition/breakdown is the most important factor in the whole process.
The model is then trained till the full collapse. I store and monitor the generated samples per predefined timeout, stopping the training and decreasing the timeout when I start observing the interesting images. This might also prove to be quite frustrating, as I noticed the universal law of GANs is that the model always produces the most striking images in iterations between the checkpoints, whatever the value the saving interval is set to - you’ve been warned.
Step 2: Generate images and select a couple hundred of those with some potential. I also generate a bunch of mosaics from these images using Python scripts. This piece from the Shelfie series or Latent Scarf are some examples.
Step 3: Use CycleGAN to increase the image resolution. This step involves a lot of trial and error, especially around what images are in the target domain data sets (CycleGAN model is trained to do an image-to-image translation, i.e., images from the source domain are translated to the target domain). This step could yield images to stand on their own, like Stand Clear of the Closing Doors Please or Harvest Finale.
Step 4: Many of SNGAN-generated images might have a striking pattern or interesting color composition but lack enough content to stand on their own. The final step then is to use such images as part of the collage. I select what I call an anchor image of high resolution (either from step 3 or from some of my cycleGANned drawings). I also developed a set of OpenCV scripts that generate collages based on image similarity, size, and position of anchor images with SNGAN images setting up the background. My favorite examples are Egon Envy or Om.
This process, as often with concept art in general, carries a risk of getting a bit too mechanical - the images might lose novelty and become boring so it should be applied judiciously and curated ruthlessly. The good news is that it opens new possibilities - the most exciting directions I started exploring recently are using GAN outputs:
As designs for craft, in particular for glass bas-reliefs. Thanks to semi-abstraction and somewhat simplified rendering of often exuberant colors and luminance they might exhibit organic folksy quality. Many generated images could be reminiscent of patterns of the Arts & Crafts Movement. It’s still early in the game to share the results, but I showed images such as in this set to experienced potters and glassmakers and got overwhelmingly enthusiastic responses (Surfaces and Stories).
In what I call “computational non-photography” - layering and remixing generated images to create new ones. Indian Summer or Latent Underbrush are examples of this technique.
Conclusion
Even with the limitations imposed by not having a lot of compute and huge data sets, GAN is a great medium to explore precisely because the generative models are still imperfect and surprising when used under these constraints. Once their output becomes as predictable as the Instagram filters and BigGAN comes pre-built in Photoshop, it would be a good time to switch to a new medium.